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Abstract 

Models written in description languages such as CellML are becoming a popular solution to the handling of complex 
cellular physiological models in biological function simulations. However, in order to fully simulate a model, boundary 
conditions and ordinary differential equation (ODE) solving schemes have to be combined with it. Though boundary 
conditions can be described in CellML, it is difficult to explicitly specify ODE solving schemes using existing tools. In 
this study, we define an ODE solving scheme description language-based on XML and propose a code generation 
system for biological function simulations. In the proposed system, biological simulation programs using various ODE 
solving schemes can be easily generated. We designed a two-stage approach where the system generates the 
equation set associating the physiological model variable values at a certain time t with values at t + At in the first 
stage. The second stage generates the simulation code for the model. This approach enables the flexible construction 
of code generation modules that can support complex sets of formulas. We evaluate the relationship between models 
and their calculation accuracies by simulating complex biological models using various ODE solving schemes. Using 
the FHN model simulation, results showed good qualitative and quantitative correspondence with the theoretical 
predictions. Results for the Luo-Rudy 1 991 model showed that only first order precision was achieved. In addition, 
running the generated code in parallel on a GPU made it possible to speed up the calculation time by a factor of 50. 
The CellML Compiler source code is available for download at http://sourceforge.net/projects/cellmlcompiler. 



Introduction 

In recent years, the continued development in computer 
processing power paved the way for the increased use of 
biological function simulation. Computers have proven 
to be invaluable in analysing complex and nonintuitive 
biological models and biologists are turning to them to 
complement their experiments. Simulations enable the 
testing of experimentally unfeasible scenarios and can 
potentially reduce experimental costs. However, the num- 
ber and complexity of physiological models has also 
grown with the increase in computing performance. This 
creates challenges in reproducing simulated behaviours 
of the published models and reuse of models by other 
researchers, hindering the dissemination of science and 
knowledge integration. 
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One way to address model complexity is to use markup 
language-based model descriptions. Some popular exam- 
ples include CellML [1], SBML (Systems Biology Markup 
Language) [2] and insilicoML [3]. CellML is an open stan- 
dard markup language capable of describing mathematical 
models of cellular functions. SBML is an open interchange 
machine-readable format for representing models of func- 
tions such as metabolism and cell signalling. Meanwhile, 
insilicoML describes mathematical models for biophysi- 
cal objects and incorporates morphological information 
such as shape, angle and position. SED-ML (Simulation 
Experiment Description Language) [4] is another type 
of description language which can encode the informa- 
tion of simulation experiments. These markup languages 
allow researchers to take advantage of the vast amount of 
biological function models using a common set of easily 
readable and versatile description rules. 

Biological and physiological function models are gener- 
ally described by differential equations. A typical simula- 
tion of biological function models consists of three parts: 
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a model equation, a boundary condition, and an ordi- 
nary differential equation (ODE) solver. Model equations 
and boundary conditions can be described using CellML, 
while ODE numerical solutions like Euler and Runge- 
Kutta methods are typically built into the simulation soft- 
ware. However, it is necessary to be flexible in using ODE 
schemes in order to strike a balance between computa- 
tional stability and speed. In addition, those using special 
hardware environments such as massively parallel com- 
puter systems require dedicated proprietary software to 
support their numerical solution needs. Thus, description 
languages like CellML and dedicated simulation software 
are not suitable or practical for flexibly incorporating 
different ODE solving schemes. 

To address the need for more flexibility in creating 
simulation software, we created Time Evolution Calcu- 
lation Markup Language (TecML), a machine-readable 
format for encoding ODE numerical solutions. TecML is 
a description language based on the extensible markup 
language (XML). This description language is designed to 
specify and store the numerical methods that can be used 
for solving the ODEs in biological models. It also allows 
the assignment of boundary conditions into the simula- 
tion experiments. The following sections describe TecML 
and how it is integrated into the proposed code generation 
system, which automatically generates codes for biological 
simulations. 

The target of this study is limited to the use of different 
ODE numerical solutions and their application to models 
described in CellML. We propose an algorithm that allows 
users to change the ODE solution and boundary condi- 
tions of the model according to the computational needs 
of their simulation. To verify the effectiveness of the pro- 
posed system, we generate executable codes for several 
CellML models using a number of ODE numerical solu- 
tions. The system can generate code in several program- 
ming languages and code that runs in both sequential and 
parallel computing environments. Simulations on GPU 
(Graphics Processing Units) were undertaken to show the 
effect of using parallel computing on processing time. 

Biological simulation code generation system 

Summary of simulation code generation system 

The proposed method is composed of two stages 
(Figure 1). In the first stage, the system represents the 
biological model by incorporating an ODE numerical 
solution method into the model's differential equations. 
This creates the equation sets that calculate the time 
evolution of the mathematical model. The second stage 
generates the simulation code for these sets of equations, 
allowing the user to run computer simulations of the 
model in machines with general-purpose compilers. This 
approach enables the flexible construction of code gener- 
ation modules that can support complex sets of equations. 



By generating the code separately for each section of the 
mathematical model, parallel code execution can be easily 
integrated into the simulation. 

To illustrate the capabilities of the proposed system, we 
used the FitzHugh-Nagumo (FHN) excitable media model 
[5] as an example. The model, proposed by R. FitzHugh 
in 1961 and which J. Nagumo et al. created an equivalent 
circuit, is a simplification of the Ho dgkin- Huxley model 
and can be described by the following equations: 

r = x^, (1) 
dx/dt = X — r/3.0 — y -\- a, (2) 
dy/dt = b(x -\- c — d • y)y (3) 

where r, x and y are variables, t indicates time and b, 
c, d are constants. Removable algebraic expressions such 
as equation (1) are often used in biological models to 
improve the models readability. 

As for the ODE numerical solution, we used the Mod- 
ified Euler method to solve the ODEs in (2) and (3). By 
combining a mathematical description of the model above 
with the ODE solution method and boundary conditions, 
we can derive the formula to calculate the time evolution 
of the independent variables x and y. This involves finding 
the solution to the variable vector § using its derivatives, 
temporal variable vector and initial conditions at time t 
given by 

d^/dt=f{U^.i\ (4) 
i=g{U^), (5) 

The rest of this section shows how the proposed sys- 
tem incorporates the Modified Euler method into the 
model equations to calculate for The following set of 
equations represent the numerical solution with boundary 



conditions: 

§0 = (6) 

to = t, (7) 

to=g(to.^o). (8) 

fci =f(to,^o,io), (9) 

$i=§o + /^i-5, (10) 

ti = to^8^ (11) 

H =g(h>^i). (12) 

IC2=f{h.^^.il). (13) 

§2 = ?o+ 2^'^i+^2)-5, (14) 

?m = (15) 



where ^, k and i are the vectors representing the variables 
and differential equations in the biological model. 

By applying the ODE solver described above, we can 
arrive at the set of equations detailing the numerical solu- 
tion for the initial value problem of the FHN model. 
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Inputs 



<RelML> 



derivative Var 

constants 

Atime 



4f 



<CellML> 


HH y- 




J 1. 1 1 




1 T 









< TecML> 




CellML Simulation Code 
Generation System 



Equation Sets 

F(x(t),y{t),x(t + At),y(t + At)) 

x(0) = a 

y(0)=b 



Code Generator 

kll=yl*c; 
kl2 = xl + yl; 
x2 = xl + kll*dt 
y2=yl+kl2^dt 



4 



Output 



C/C++ 
Java 

Javaw/ Big Decimal 
Cuda (Parallel) 



Code 



Figure 1 Code generation system inputs and outputs. Input and output of the Biological Simulation Code Generation System. The inputs are 
composed of a CellML, TecML, and RelML file. The system can generate simulation codes in C/C++, Java, and Cuda C programming language. 



First, from equations (6) and (7), the current values of 
the differential variables at t are assigned as the initial 
values with 



^0 — 
Jo = yu 
to = t. 



(16) 
(17) 
(18) 



Then with equations (8) and (9), the differential and 
nondifferential equations are expanded and evaluated 
using these initial values to obtain 



(19) 



Ki,x =Mto,[xo yoV >ro) 

= xo - ro/3.0 - yo-\-a, (20) 

/<i,y =f2(todxo yoV ^ro) 

= b(xo + c - d -yo). (21) 

Next, the new values of x and y are computed as func- 
tions of a:i and 8 (equations (10) and (11)) and given 
by 

xi = xo-\- Ki^x ' <5, (22) 
yi=yo + Ki,y ' <5, (23) 
ti = to + 8, (24) 

This process is repeated for ri and a: 2 as shown in 
equations (12) and (13); 

n =gi(ti,[xiyi]^)=xl, (25) 

K2,x =Mti, [xi ji]^,ri) 

= xi — ri/3.0 — yi-\- a, (26) 

K2,y =f2{ti, [Xi Jl]^rl) 

= b(xi-\-c-d'yi), (27) 



Finally, the value of the differential variables are 
obtained by advancing the solution from time t to t -\- 8 
(equations (14) and (15)) with 



y2=y0 + ]^{Ki,y + K2,y) ' 8, 

Xt^8 = X2> 
yt+8 = J2. 



(28) 

(29) 

(30) 
(31) 



The variable r of the equation set corresponds to i while 
X and y are the variables in the vector The deriva- 
tives dx/dt and dy/dt are expressed by k in equations 
(20), (21), (26) and (27). 

In order to automatically incorporate the ODE numer- 
ical solution into the CellML model, we introduced 
an ODE solving scheme description language called 
Time Evolution Calculation Markup Language (TecML). 
TecML is an XML-based description language that can be 
used to configure ODE solutions and implement the algo- 
rithm described in equations (6) - (15). Another input of 
the system is the Relation Markup Language (RelML) file. 
The basic role of a RelML file is to relate the variables and 
variable types of the CellML file into their equivalent in 
the TecML file. 

The simulation code generation system uses these 
inputs to generate the set of equations describing the time 
evolution of the variables in biological models. This step 
creates equations (16) - (31) in the FHN example. Once 
the model equations are applied with the ODE numerical 
solution method, the next stage involves the generation of 
the executable code that will do the actual calculations. 
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Description language 

The code generation system requires three inputs, namely, 
the CellML file, TecML file and RelML file. This section 
gives a short introduction of each and how it is used in the 
algorithm and consequently, in the FHN model example. 

CellML model encoding standard 

The CellML [1] language is an open standard based on 
XML for describing mathematical models. It was designed 
to allow scientists to share models even if they are using 
different model-building software. The majority of the 
models stored in its repository are cell representations 
and these include information about the cell structure, 
equations for underlying processes and in some cases, 
boundary conditions. Figure 2 shows how a CellML model 
describes the mathematical equations of the FHN cell 
model. The file lists the input and output variables as well 
as their initial values and underlying equations. Note that 
compact syntax representation is used to show its con- 
tents and is based on Alan Garny s notation for CellML in 
the Cellular Open Resource (COR) software [6]. 

TecML: an ODE solving scheme description language 

TecML (Time Evolution Calculation Markup Language) is 
an XML-based description language designed to describe 
ODE numerical solutions that can be used in biosimu- 
lations. This proposed standard allows the mathematical 
description of solving schemes like the Euler and Runge- 
Kutta methods. It allows integration of numerical meth- 
ods with other description languages like CellML. TecML 
categorizes variables into six different types (Table 1) 
namely, diffvar, derivativevar, arithvar, constvar, timevar, 
and deltatimevar. The variables determined by a rate of 
change with respect to time (§) are referred to as dif- 
ferential variables [diffvar) while their derivatives [k) are 



def model flm as 
def comp Main as 
var t\ {init: 0}; 
var r: {init: 0}] 

varx: {init: -1.501250563778375}] 
var y: {init: -0.376213677498469}; 
var a: {init: 0}; 
var b: {init: 0.03}; 
var c: {init: 1.2}; 
var d: {init: 0.3}; 



dx/dt = X — r/3.0 — y -\- a; 
dy/dt = b(x + c — dy); 
enddef; 
enddef ; 



Table 1 Variable and function types in TecML 



Type Definition 



dlffvor 


Differential variable (variable is a function of 




time) 


arithvar 


Temporal variable (can be substituted with 




math equations) 


derivotivevar 


Derivative of diffvar 


constvar 


Constants 


timevar 


Time variable 


deltatimevar 


Variable denoting the change of time per 




step 


diffequ 


Differential equation 


nondlffequ 


Non-differential equation 



called derivative variables {derivativevar). The removable 
variables {i) are the arithmetic variables {arithvar) and 
variables that do not change in value (^) are the constants 
{constvar). In addition, the time {t) and time increment (5) 
are referred to as timevar and deltatimevar, respectively. 
TecML also divides the mathematical equations into two 
types; namely, differential ifQ) and non-differential {gQ) 
equations. Equations of type {diffequ) are the derivatives 
of a function while {nondiffequ) are the arithmetic func- 
tions. The information and example of a TecML file for 
the Modified Euler method are shown in Figure 3 and 
Figure 4, respectively. 



def method ModifiedEuler as 
def variables odesolvar as 

vartype input: {^t}; 

vartype output: {^o}; 

vartype diffvar: {^o ,6,6}; 

vartype derivativevar: {/^i,At2}; 

vartype arithvar: {^^0,^-1}; 

vartype constvar: {(,}; 

vartype timevar: {r}; 
enddef; 

def equations odesolequ as 
equtype diffequ: {4>{i, i, r, <;')}; 
equtype nondiffequ: {7(6 C)}; 

enddef; 

6 = 6; 

^0 = 7(6:^o,T, C); 
f^i = (/>(6:^o,r, C); 
6 = 6 + ^1 * ^; 
^1 =-f{^i,ii,T + S.X); 
1^2 = 0(6, + C); 
6=6 + {f^i + ^^2) * S; 
6 = 6; 

enddef; 



Figure 3 Information in the TecML file of the Modified Euler 
method where ^ is the diffvar, k is the derivativevar, i; is the 
constvar, and variable type i is the arittivar. 



Figure 2 Information in the CellML file of the FHN model 
(represented in Alan Garny's COR notation). 
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<tecml> 

<inputvar name="xi" type="dif f var " /> 
<outputvar name="xo" type="dif f var " /> 
<variable name="yl" type="arithvar " /> 
<variable naine="kl" type="derivat ivevar " 
<function name="f" type="dif f equ"> 



/> 



<math xmlns= "http : //www . w3 . org/ 1998/Math/MathML " > 
<apply> 

xO = xi (in MathML code) 
</apply> 

</math> 
</tecml> 



Figure 4 TecML file example. 



Relation Markup Language (RelML) 

RelML is a language for describing the correspondence 
between the variables in the CellML model file and the 
variable types in an ODE numerical solution scheme 
described in a TecML file. Figure 5 shows the corre- 
spondence of variables described in the FHN CellML 
file (Figure 2) and the variable types in the TecML file 
(Figure 3). For the FHN model in Figure 2, variables x and 
y are defined as diffvar and their respective derivatives 
dx/dt and dy/dt as derivativevar. In addition, by c, and d 
are constvar type while r is considered a temporal variable 
or arithvar type. Equations (2) and (3) are the formulas for 
calculating the diffvar so the functions are categorized as 
diffequ while the arithmetic equation for r in equation (1) 
is a nondiffequ. 

An example RelML file where the Modified Euler 
method is applied to the FHN model is shown in Figure 6. 
The first two statements after the header indicate the 
filename and location of the corresponding CellML and 
TecML file. The succeeding lines enumerate all the vari- 
ables used in the FHN model and their corresponding 
types. The complete RelML and TecML files used in this 
paper have been published on the Web [7]. 



def connection flmJModifiedEuler as 
def cellml flin as 

filename ** flin. cellml" ; 
enddef : 

def tecml Modified Euler as 

filename " ModifiedEuler. tecml" ; 
enddef: 

def variables cellmlvar as 
vartype diffvar: {x. y}\ 
vartype arithvar: {r}: 
vartype constvar: {a. b. c. d}\ 

enddef: 
enddef: 



Figure 5 Information contained in the RelML file for the FHN 
model and Modified Euler method. 



<relml> 

<cellinl f ilename="model/cellml/fhn. cellml"/> 
<tecml filename="model/tecml/ModifiedEuler. tecml "/> 



<variable 
<variable 
<variable 
<variable 
<variable 
<variable 
<variable 
<variable 
</relml> 



name= 
name= 
name= 
naine= 
name= 
name= 
name= 
naine= 



time" type="timevar" /> 

x" type="dif f var" /> 

y" type="dif f var" /> 

r" type="arithvar" /> 

a" type="constvar" /> 

b" type="constvar" /> 

c" type="constvar" /> 

d" type="constvar" /> 



Figure 6 RelML file example for FHN model and Modified Euler 
method. 



Executable code generation for equation sets 

The system implementation of the code generator allows 
the creation of codes in a number of programming lan- 
guages. Table 2 lists the tested inputs and outputs of 
the system. For a desktop CPU environment, the gener- 
ator can produce source codes in C, Java, and Java with 
BigDecimal library, which can handle higher precision 
computing. Codes can be generated for both single cell 
and cell array simulation. The cell array code genera- 
tor can produce both ID and 2D excitation propagation 
codes. The generated codes for 2D simulation describes 
the excitation propagation in a rectangular array with 
N X M number of cells. Aside from generating single 
CPU codes, it can also create codes suited for parallel 
computing that runs on a GPU machine. 

Methods: simulation code generation algorithm 

Cell model description 

The cell model is a collection of variables and equation 
and can be written as 



k = dx/dt =f(x,y, t, z) 
y = g{x,y, U z) 



(32) 



where xisz. differential variables vector, k is a derivatives 
vector, y is a temporal variables vector, and z is a constants 
vector while t is the scalar variable for time. These variable 
vectors can be expressed as sets with 

X = [x\,X2r " 7^NxV ^ (33) 
k = [kiJ<2."' J<Nf . (34) 

y = [yi>y2r" >yNyV . (35) 

z = [zi,Z2r" .zn.V . (36) 
Table 2 Input and output code types in the system 



Cell Physiology 


FitzHugh-Nagumo model [5] 


Model 


LuoRudy 1991 model [8], Kyoto 2003 Model [9] 


ODE Solution 


Euler Method, Modified Euler Method 


Scheme 


1^^-, 2"^-, 4^^-order Runge-Kutta Method 


Generated codes 


C code, Java, Java with BigDecimal 




Cuda code (GPU) 
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where Nx^ Ny, are the variable count for /c, y, and z, 
respectively. Furthermore, f(Xy y, t, z) and g{x, y, t, z) are 
function vectors with ki =fi(x,yf z) and yi = giix^yy z). 
These function vectors are defined by 

fix, y, t, z) = [/i {x, y,t,z)r-- Jn^ (x, y>t,z)f , (37) 
g(x,y,t,z) = [gi(x,y,t,z),-'- ,gNy(x,y,t,z)]^ . (38) 

ODE solving scheme 

Table 3 shows the elements inside a TecML file. Note 
that all the TecML variables are denoted with Greek let- 
ters. The differential equations for the cell model are 
represented by the terms d^/dt = <^(^,i,t,^) and i = 
T(^,i,t,^). Inside a TecML file, the dependence between 
the differential variables §o (time t) and (time t + 8) is 
given by 

?,. = (7KS,n,5) (0</<7V^), (39) 
Ki^i = 4>(§,, ii, Ti, O (0 < / < 7V^), (40) 
ii = T{^^,iuXui;) (0<i<N^), (41) 
Xi = Ti{t,6) iO<i<N^\ (42) 

where the differential variable vector S and derivative 
variable vector 11 are given by 

S=[§o>§p--->«iV,]^> (43) 



n =[/Cl,/C2,--- ylCNt] . 



(44) 



CellML and TecML integration 

The integration of the CellML model and TecML solving 
scheme involves the mapping of corresponding variables. 



Table 3 Information written 

Notation 



in a TecML file 

Definition 



?0 =[^0,1/^0,2/- • • ,Hq,N^^ 
< / < A/^) 



T = Ti(t,8) 

n =[K],K2," ' ,ICN^f 



Input differential variable vector 
Output differential variable vector 
Time differential variable vector 
Differential variable vector of ^ 
Current tinne value 
Time step 
Constants vector 
Calculation of variable Tj 

Differential equation vector 

Temporal function vector 

Derivative variable vector 
Temporal variable derived from § 
Derivative vector of ic 
Relation between ^/ and S, K, 8 



The mapping shows how each physiological model vari- 
able written in CellML is replaced with its correspond- 
ing TecML variable. The differential variable vector § of 
TecML corresponds to the differential variable vector x of 
CellML and reads as 



(45) 



TecML's derivative variable k equates to the CellML's 
derivative variable dx/dt and the temporal variable i cor- 
responds to y and given by 



ic^^i ^ dxk/dt 



dx^^i dxk;i dxjc^Nx 



^k 



dt dt dt 

yk =[yk,iyyk,2* • • • yyk,NyV • 



(46) 
(47) 



Furthermore, the TecML equations k = ^, ^) and 

I = T(^, I, t, ^) correspond to the CellML equations / and 
gf respectively, as shown by 



r($,^r,?) 



-f(x,y, t, z), 

-g(x,y,t,z). 



(48) 
(49) 



Replacement algorithm 

Once the variables are mapped, the differential and arith- 
metic equations are transformed and expanded according 
to the numerical method described in the TecML file. 
First, the variables in the CellML model equations are 
replaced with their corresponding TecML variables. Each 
model equation is searched for any diffvar, derivativevar, 
arithvar, constvar, timevar, or deltatimevar and that vari- 
able is replaced with the corresponding TecML variable 
name. This replacement procedure is represented in Stra- 
chey brackets and expressed by the following function: 



(50) 

where \equ\ is the TecML equation and, for all / = 
1, 2, A/^, the replacement functions are 

^xM = Ixil (51) 
IZkliCij = Ikil (52) 

nylaj = bj, (53) 
ntlTij = (til (54) 

^zM = Izl (55) 
^dlSj = Idl (56) 

Note that the Strachey brackets in 7^^|$ J = {xi} means 
that all the TecML diffvar {i.e. for / = 1 ... AT^) in the argu- 
ment is replaced with the corresponding CellML diffvar 
Xi. This function is true for all the other variable types in 
TecML. 

The generated set of equations advances the solu- 
tion of the differential variables from time t to t -\- 8. 
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Figure 7 shows how the algorithm generates the simula- 
tion program from the input files. Note that the subrou- 
tine replace_v ( ) in the algorithm replaces variables in 
the CellML model equation with the TecML variables as 
expressed in equation (50). The function replace_s j ( ) 
generates one scalar equation from a vector equation and 
appends index j to the variables. In the equations which 
do not contain functions / or ^, replace_d() gener- 
ates multiple scalar equations from a vector equation. 
Finally, rep lace _f ( ) and replace_g ( ) unfold func- 
tions / and respectively. These subroutines can be 
expressed in the following transformations: 

IZg^jlequJ = makeEquation|<Sy,y|<Sy^,y|getLHS|e(5^w]]], 

(57) 

IZdjlequj = SkjlS^^jlequjl (58) 
7^/[(/)y(?,^r,<)l = ry,iir^,klfj(^.y,t,z)jl (59) 
^^Ik/C^^^.^)! = Ty,ilTxj<lgj(x,yJ,z)jl (60) 



where the index-appending functions S and unfolding 
functions 7" are defined as 



^yjbii = biA> (62) 

^f,jm = m> (63) 

^g,jm = Irjl (64) 

TM = bkl (65) 

TyjM = bkl (66) 



Experiments and results 

Code generation for the cell model simulation 

The FHN cell model [5] was used to evaluate the proposed 
system. The system was also used to generate simulation 
codes for a number of cell models (LuoRudyl991 [8], Luo- 
Rudyl994 [10] and KyotoModel2003 [9]) and with other 
more accurate ODE numerical methods (e.g. 4th-order 
Runge-Kutta method). All the generated code used in the 
simulation experiments can be downloaded from the files 
section of the program generator site [7]. 



generate_equations(){ 

Equation equation_set[] all equations in TecML: 

Equation equation_set2[] <— null: 

int equ2-count <— 0: 

for i -<— 1 to /en(;i/i[equation_set] do 

equation_set[/] replace_v(equation_set[/]): 
if equation-setf/] includes 4> then 
for j 1 to N.r do 

equation_set2[e(7i/2_couni++] <— 

replace_s j (eq uation_set[?] . j): 

end 

else if equation_set[i] includes T then 
for j ■<— 1 to Ny do 

equation_set2[eqw2_couni++] 

replace_sj(equation_set[/]. j): 

end 

else 

for j •<— 1 to N,j- do 

equation_set2[e(/w2_coiint++] <— 

replace_d(equation_set[i], j); 

end 

end 

for ? •<— 1 to /en(7f/?[equation_set2] do 
if equation_set2[/] includes f then 
for j ^ 1 to N.r do 
equation_set2[/] 

replace_f (equation_set2[/] . j): 

end 

else if equation_set2[/] includes g then 
for J •<— 1 to A',, do 
equation_set2[j'] 

replace_g(equation_set2[/]. j): 

end 

end 

return equation_set2: 

} 

replace_v(Equation equ) { 
for ? 0 to do 

replace all with a;, in equ; 
replace all with k, in equ; 
replace all tj with y, in equ; 
replace all r,; with tj in equ; 

end 

replace all <^ with z in equ; 
replace all <5 with d in equ: 
return equ: 

} 



replace.sj (Equation equ. j) { 
Ihs getLHS(equ): 
rhs getRHS(equ); 
replace all fc, with kij in Ihs; 
replace all with yij in Ihs; 
replace all 4> with (pj in rhs; 
replace all T with 7j in rhs; 
equ <— mcLkeEquation(lhs, rhs); 
return equ; 

} 

replace_d(Equation equ. j) { 
replace all xi with .r, j in equ; 
replace all kj with kj j in equ; 
return equ; 

} 

replace_f (Eciuation equ. j) { 
Ihs getLHS(equ); 
rhs getRHS(equ): 

replace (pj{xi,..yiJ,n . z) with fj{x.y.t,u.z) in rhs; 
for / -<— 1 to Nx do 

replace all Xj with x^. j in rhs; 

end 

for i ^ 1 to Ny do 

replace all y, with y/., in rhs: 

end 

equ meLkeEquation(lhs. rhs); 
return equ; 

} 

replace_g(Eciuation equ. j) { 
Ihs getLHS(equ): 
rhs getRHS(equ): 

replace ')j{x^..yi.t,„.z) with gj{x,y.t,„ , z) in rhs: 
for / <- 1 to iV, do 

replace all .r, with .r^.., in rhs; 

end 

for / ■<— 1 to Nf, do 

replace all y, with y; , in rhs; 

end 

equ <— meLkeEquation(lhs. rhs); 
return equ; 



Figure 7 Algorithm for generating the set of ODE numerical solution equations. 
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The traditional approach to creating simulation exper- 
iment codes also offers little flexibility once the soft- 
ware is created. Changing the numerical ODE solution 
method means making major revisions in the simulation 
code. The proposed method allows the flexibility of using 
different simulation models, boundary conditions, and 
numerical methods for a simulation experiment. Since 
the simulation codes are generated automatically, users 
can choose or change their desired ODE solver without 
making changes in the simulation codes themselves. Also, 
this can give clear information on what is calculated to 
generate the simulation results. 

The different cell models and ODE numerical methods 
produced varying simulation code sizes. Table 4 lists the 
number of execution steps generated for the cell models 
using different ODE solutions. The table shows that the 
more complex the model becomes and the more equations 
it has, the larger the number of steps to compute the 
model equations in the code. 

An issue in approximating ODE solutions is the accu- 
racy of the numerical method used. A good approxima- 
tion to the underlying differential equation needs to be 
achieved in order to arrive at accurate simulation results. 
We tested a number of commonly used ODE numerical 
methods to determine how the use of different solutions 
affect the accuracy of the calculations. The three methods 
used were Euler, Modified Euler and 4th-order Runge- 
Kutta. Each of these methods were used to generate an 
FHN model simulation code that runs in a single CPU. 
The generated code can run in different compilers and 
does not require third party software. 

In order to test the accuracy of the ODE numerical 
methods, simulation codes were generated in Java using 
the BigDecimal class and numeric formatting. The Java 
BigDecimal can represent a large number of decimal 
places and help avoid rounding errors. It can offer higher 
precisions than the 16 decimal digits offered by floating 
point double. In the simulation codes, we used BigDeci- 
mal to represent the numbers in a 32-digit decimal point 
precision format for all the ODE solving methods. Differ- 
ent time steps were also used in testing the accuracy of 
these ODE methods, ranging from 10~^ ms to 10~^ ms. 
The simulation using a time step of 10~^ and Runge-Kutta 

Table 4 The number of execution steps for the generated 
codes for different cell models and ODE numerical 



methods 


Cell Model 


Euler 


Modified Euler 


Runge-Kutta 


FHN 


11 


16 


26 


LuoRudyl991 


70 


117 


211 


LuoRudyl994 


123 


211 


387 


Kyoto Model 


335 


560 


1200 



as ODE solving scheme was used as the basis to com- 
pute for the root-mean-square error (RMSE) and evaluate 
the accuracy of the other calculations. The RMSE was 
computed for all the calculations using different ODE 
numerical methods and in varying time steps (Figure 8). 
Each level in the RMSE indicates a 1/10 less accuracy in 
the simulation results compared to the calculations using 
Runge-Kutta and 10~^ ms time step. 

The first-order Euler method gave the largest error 
(log error ^ 10~^) while the Modified Euler resulted to 
a smaller error. The fourth-order Runge-Kutta method 
resulted with the best accuracy (log error ^ 10~^^). How- 
ever, the Runge-Kutta error is almost constant from the 
time step 10~^ ms, . This can be attributed to the rounding 
error of digits over the used 32-digit precision. 

Simulation codes using different ODE numerical meth- 
ods were also generated for models more complicated 
than the FHN or Hodgkin-Huxley model. For this, we 
used the model introduced by C. Luo and Y. Rudy in 
1991 [8]. It is a simple cell model of cardiac action poten- 
tial that uses Hodgkin-Huxley type equations to calculate 
ionic currents. The RMSE computations undertaken for 
the FHN model were also applied for the Luo-Rudy 1991 
simulations. Codes were generated for the three ODE 
solutions with time steps ranging from 10~^ to 10""^ ms. 
Meanwhile, the simulation using the Euler method with a 
time step of 10~^ ms was used as the reference for RMSE 
calculations (Figure 9). 

Excitation propagation simulations 

One possible application of 1- or 2-dimensional excitation 
propagation simulation is cardiac arrhythmia research. 
Computational models have provided new insights into 
the underlying mechanisms of re-entry like the role of 



^ 10^ 




time step [msec] 



Figure 8 Relationship between the time step and calculation 
error for each ODE numerical method used in generating the 
FHN model. The results and root-mean-square errors (RMSE) are 
computed using Java BigDecimal. The result of the Runge-Kutta 
method with 1 0"^ ms time step is used as reference for the RMSE 
computations. 
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10° 



10' 



10-^ 

time step [msec] 



10" 



10^ 



Figure 9 Relationship between the time step and calculation 
error for each ODE numerical method used in generating the 
LuoRudy 1991 model. 




5 7 9 11 13 
Number of Cell (x 10240) 

Figure 10 CPU and GPU execution time ratio of the excitation 
propagation simulation. 



ionic currents and ion channel mutations [11]. In addi- 
tion, designs of defibrillation treatments can be optimized 
using simulations of excitation propagation to achieve 
good clinical results for patients. 

Simulation codes for the Luo-Rudy 1991 model were 
generated to simulate cell excitation propagation on a 
2D homogeneous sheet. The two-dimensional cell action 
potential propagation simulations were run for both CPU 
and GPU to compare the speed of calculations. The CPU 
simulation was performed with an intel Core i7 880 pro- 
cessor with 8 GB of memory running Windows 7. The 
GPU has a single Nvidia Tesla C2050 processor with 448 
CUDA cores, and 3 GB memory in a CentOS 5.5 system. 
The programs are written in C for the CPU simulations 
and Cuda C for the GPU. 

In the experiments, the size of the homogeneous sheet 
was varied from 10240 (10240 x 1) to 1433600 (10240 
X 14) cells with 10240-cell increments. The computa- 
tion time was measured for each increment. Figure 10 
shows the CPU and GPU computational time at differ- 
ent cell array sizes. Results showed that by using the GPU, 
the computational time can be accelerated 50 times as 
compared to using a standard CPU. 

Discussion 

Traditionally, biological function simulation programs 
have to be manually created from scratch. This is manage- 
able with small models or simulation experiments but it 
becomes less practical once model complexity increases. 
The more complex the model becomes the larger the 
program becomes and the harder it is to create and main- 
tain such programs. The Doty model [12] showed that 
the cost of software development is proportional to the 
square exponential of the number of computational steps. 
As seen from Table 4, the number of computational steps 
increases with the complexity of the model. By using 
this system, the creation of a simulation program from 



biological models becomes simpler. The system takes 
input from experts in biological functions through CellML 
models and mathematical experts through ODE numeri- 
cal solutions and automatically generates the simulation 
code. The automatic code generation allows it to deal 
with large and complex models without the proportional 
increase in software development cost. This also keeps the 
software maintenance cost low since the programs are cre- 
ated with the same structure, regardless of the biological 
model under study. 

The systems support for multiple programming lan- 
guages makes it easier for users to test the model in 
different programming environments and keep the cost 
of switching from one programming language to a low. 
Based on the codes generated for the purpose of this 
paper, science and engineering students typically create a 
Java simulation code from the C version within four days. 

The system requires the formatted RelML file to do the 
first stage of the algorithm and the RelML information 
can also be used to automatically determine the bound- 
ary conditions. The RelML information can be extracted 
automatically from the CellML model file but boundary 
conditions are not always available. We need a mathemati- 
cal representation of boundary conditions which are com- 
patible with declarative description. This can be addressed 
by the inclusion of PEPML experimental protocol [13] in 
the system. PEPML is an XML-based language that can 
describe the initial conditions and procedures of an exper- 
imental protocol. It describes boundary conditions in a 
mathematical form and is purely represented in a declar- 
ative manner, making it easy to combine with the current 
implementation of our system. 

The attainable order of accuracy for approximating 
the Luo-Rudy 1991 model equations was determined 
by running simulations of the model using different 
ODE numerical methods. The results of the comparisons 
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between ODE solving schemes showed that the model can 
be accurately predicted by the first-order ODE solutions. 
The use of higher-order numerical solutions has a mini- 
mal effect on the values of the approximations (Figure 9). 
The Luo-Rudy model is a set of nonlinear differential 
equations that use higher order ODE solutions but it can 
be seen from the Taylor series expansion that equations 
from the second-order term has little or negligible effect. 
The structure of the system, where the ODE solver can 
be easily changed with the appropriate TecML file, allows 
one to confirm the accuracy of the model simulation such 
as the case with the Luo-Rudy 1991 model (see Additional 
file 1). 

Although the system is fully implemented, further 
improvements are still to be added. The generated excita- 
tion propagation code does not consider the optimization 
of the model variables for parallel processing. Therefore, 
code parallelization is low and may have contributed to 
slower execution in the GPU environment. Also, the sys- 
tem can only handle equations with a single term in the 
left-hand side. Other equation forms will be addressed in 
future implementations. 

Furthermore, simultaneous calculations using differ- 
ential algebraic equations (DAE) are still not available. 
Simultaneous equations appear in some models and 
implicit numerical methods. The handling of implicit 
methods is available in the inputs but not in the cur- 
rent implementation of our system. A simultaneous 
equation library, other ODE numerical solutions library 
like CVODE, or a linear algebra library like LINPACK 
could be incorporated into our system. This can be a 
possibility with our plan to design a library interface 
description language to handle numerical solution inputs. 

Future implementations also need to address the need 
for a declarative design of describing procedural methods. 
Complex ODE numerical methods like the ones described 
in the Kinetic Simulation Algorithm Ontology (KiSAO) 
[14] are difficult or impossible to encode in the current 
version of TecML. We are in the process of redesigning 
TecML and the system to incorporate methods that are 
suitable for the types of problems scientists are addressing 
with manually-created code, including adaptive methods. 
This can greatly enhance the usability of the proposed 
method. 

In addition, the system may be integrated with SED-ML. 
SED-ML allows description of the simulation environ- 
ment, which includes the name of the numerical method 
to be used in the simulation. A starting point for inte- 
gration would be to provide TecML information of a 
numerical method for the KiSAO entry in SED-ML to 
allow simulation software to refer to this description. 
Another point would be to create support for SED-ML 
files in the code generation system in order to create more 
complex experiments that use more than one model or 



experiments with different simulation methods applied. 
This will allow users the flexibility of choosing not only the 
ODE solving methods but also the experiment protocol 
when generating their simulation codes. 

Conclusion 

In this paper, we proposed a method to automatically 
generate executable simulation codes using CellML phys- 
iological models and ODE numerical solution methods. 
The generated code describes the time-evolution of the 
set of differential equations enumerated by the CellML 
model. The code generation system is composed of a two- 
stage approach that allows flexible generation of complex 
sets of equations. 

To evaluate the effectiveness of the proposed system, 
several combinations of physiological cell models and 
ODE solving schemes were generated. The output of the 
numerical approximations were in accordance with the 
published results of the cell models. Results for the Luo- 
Rudy 1991 simulations also indicated that it only has 
first-order accuracy. The comparison of execution time 
for 2D excitation propagation also showed that the use of 
GPU can accelerate the processing time by 50 times as 
compared to a CPU. 

The code generation system allows executable simula- 
tion codes to be easily generated from CellML and TecML 
files. This can be very useful in the field of biological 
model simulation since it provides the tools to quan- 
titatively evaluate the mathematical equations in these 
models. 

Additional file 



Additional file 1: Appendix. 
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